Linked Data for Information Extraction Challenge 2014 Tasks and Results
نویسندگان
چکیده
Abstract. For making the web of linked data grow, information extraction methods are a good alternative to manual dataset curation, since there is an abundance of semi-structured and unstructured information which can be harvested that way. At the same time, existing Linked Data sets can be used for training and evaluating such information extraction systems. In this paper, we introduce the Linked Data for Information Extraction Challenge 2014. Using the example of person data in Microformats, we show how training and testing data can be curated at large scale. Furthermore, we discuss results achieved in the challenge, as well as open problems and future directions for the challenge.
منابع مشابه
Presenting a method for extracting structured domain-dependent information from Farsi Web pages
Extracting structured information about entities from web texts is an important task in web mining, natural language processing, and information extraction. Information extraction is useful in many applications including search engines, question-answering systems, recommender systems, machine translation, etc. An information extraction system aims to identify the entities from the text and extr...
متن کاملA Template-Based Information Extraction from Web Sites with Unstable Markup
This paper presents results of a work on crawling CEUR Workshop proceedings web site to a Linked Open Data (LOD) dataset in the framework of ESWC 2014 Semantic Publishing Challenge 2014. Our approach is based on using an extensible template-dependent crawler and DBpedia for linking extracted entities, such as the names of universities and countries.
متن کاملUnstable markup: A template-based information extraction from web sites with unstable markup
This paper presents results of a work on crawling CEUR Workshop proceedings web site to a Linked Open Data (LOD) dataset in the framework of Semantic Publishing Challenge 2014. Our approach is based on so-called “templates of web site’ blocks“ and DBpedia for crawling and linking extracted entities.
متن کاملPrecise Medication Extraction using Agile Text Mining
Agile text mining is widely used for commercial text mining in the pharmaceutical industry. It can be applied without building an annotated training corpus, so is well-suited to novel or one-off extraction tasks. In this work we wanted to see how efficiently it could be adapted for healthcare extraction tasks such as medication extraction. The aim was to identify medication names, associated do...
متن کاملSemantic Publishing Challenge - Assessing the Quality of Scientific Output by Information Extraction and Interlinking
The Semantic Publishing Challenge series aims at investigating novel approaches for improving scholarly publishing using Linked Data technology. In 2014 we had bootstrapped this effort with a focus on extracting information from non-semantic publications – computer science workshop proceedings volumes and their papers – to assess their quality. The objective of this second edition was to improv...
متن کامل